Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences
Part 1: Healthcare Access for Vulnerable Populations
Research Question
Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?
Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.
Required Analysis Steps
Complete the following analysis, documenting each step with code and brief explanations:
Step 1: Data Collection (5 points)
Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts
Questions to answer: - How many hospitals are in your dataset? 219 - How many census tracts? 3445 - What coordinate reference system is each dataset in? Counties: WGS 84 / Pseudo-Mercator ; Hospitals: WGS 84 ; Tracts: NAD83 —
Step 2: Get Demographic Data
Use tidycensus to download tract-level demographic data for Pennsylvania.
Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)
Your Task:
# Get demographic data from ACSdata=c(over65="DP05_0024",income="DP03_0062",population="DP02_0018")d1=get_acs(state ="PA",geography ="tract",year =2023,survey ="acs5",variables = data,geometry =TRUE,output="wide")# Join to tract boundariesd1 <-left_join(census_tracts, st_drop_geometry(d1), by ="GEOID")head(d1)
Simple feature collection with 6 features and 20 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -78.42478 ymin: 39.79351 xmax: -75.93766 ymax: 40.54328
Geodetic CRS: NAD83
STATEFP COUNTYFP TRACTCE GEOIDFQ GEOID NAME.x
1 42 001 031101 1400000US42001031101 42001031101 311.01
2 42 013 100400 1400000US42013100400 42013100400 1004
3 42 013 100500 1400000US42013100500 42013100500 1005
4 42 013 100800 1400000US42013100800 42013100800 1008
5 42 013 101900 1400000US42013101900 42013101900 1019
6 42 011 011200 1400000US42011011200 42011011200 112
NAMELSAD STUSPS NAMELSADCO STATE_NAME LSAD ALAND AWATER
1 Census Tract 311.01 PA Adams County Pennsylvania CT 3043185 0
2 Census Tract 1004 PA Blair County Pennsylvania CT 993724 0
3 Census Tract 1005 PA Blair County Pennsylvania CT 1130204 0
4 Census Tract 1008 PA Blair County Pennsylvania CT 996553 0
5 Census Tract 1019 PA Blair County Pennsylvania CT 573726 0
6 Census Tract 112 PA Berks County Pennsylvania CT 1539365 9308
NAME.y over65E over65M incomeE
1 Census Tract 311.01; Adams County; Pennsylvania 995 261 64985
2 Census Tract 1004; Blair County; Pennsylvania 255 101 68929
3 Census Tract 1005; Blair County; Pennsylvania 447 89 50241
4 Census Tract 1008; Blair County; Pennsylvania 243 75 73625
5 Census Tract 1019; Blair County; Pennsylvania 578 121 16547
6 Census Tract 112; Berks County; Pennsylvania 496 111 61974
incomeM populationE populationM geometry
1 10752 4865 412 MULTIPOLYGON (((-77.03108 3...
2 17235 1660 247 MULTIPOLYGON (((-78.42478 4...
3 5060 3360 562 MULTIPOLYGON (((-78.41661 4...
4 18161 1490 333 MULTIPOLYGON (((-78.41067 4...
5 1547 1247 184 MULTIPOLYGON (((-78.40836 4...
6 6269 4123 38 MULTIPOLYGON (((-75.95433 4...
# Check for missing income valuessum(is.na(d1$incomeE))
[1] 65
# Calculate the median income across all tractsmedian_income <-median(d1$incomeE, na.rm =TRUE)median_income
[1] 72943.5
Questions to answer: - What year of ACS data are you using? 2023 ACS 5-Year Estimates. - How many tracts have missing income data? 65 - What is the median income across all PA census tracts? 72943.5 —
Step 3: Define Vulnerable Populations
Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)
Your Task:
# Filter for vulnerable tracts based on your criteriasummary(d1)
STATEFP COUNTYFP TRACTCE GEOIDFQ
Length:3445 Length:3445 Length:3445 Length:3445
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
GEOID NAME.x NAMELSAD STUSPS
Length:3445 Length:3445 Length:3445 Length:3445
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
NAMELSADCO STATE_NAME LSAD ALAND
Length:3445 Length:3445 Length:3445 Min. :7.224e+04
Class :character Class :character Class :character 1st Qu.:1.320e+06
Mode :character Mode :character Mode :character Median :4.183e+06
Mean :3.364e+07
3rd Qu.:2.858e+07
Max. :1.024e+09
AWATER NAME.y over65E over65M
Min. : 0 Length:3445 Min. : 0.0 Min. : 3.0
1st Qu.: 0 Class :character 1st Qu.: 429.0 1st Qu.: 94.0
Median : 13905 Mode :character Median : 665.0 Median :138.0
Mean : 433440 Mean : 718.8 Mean :156.3
3rd Qu.: 240806 3rd Qu.: 945.0 3rd Qu.:199.0
Max. :29792870 Max. :2541.0 Max. :826.0
incomeE incomeM populationE populationM
Min. : 13307 Min. : 472 Min. : 0 Min. : 7.0
1st Qu.: 57864 1st Qu.: 9324 1st Qu.: 2508 1st Qu.: 247.0
Median : 72944 Median : 13780 Median : 3524 Median : 375.0
Mean : 80731 Mean : 16219 Mean : 3647 Mean : 402.5
3rd Qu.: 96691 3rd Qu.: 20141 3rd Qu.: 4658 3rd Qu.: 527.0
Max. :250001 Max. :208341 Max. :10388 Max. :2092.0
NA's :65 NA's :73
geometry
MULTIPOLYGON :3445
epsg:4269 : 0
+proj=long...: 0
cat("Number of vulnerable tracts:", num_vulnerable, "\n")
Number of vulnerable tracts: 974
cat("Total tracts:", total_tracts, "\n")
Total tracts: 3445
cat("Percentage of vulnerable tracts:", percent_vulnerable, "%\n")
Percentage of vulnerable tracts: 28.3 %
Questions to answer: - What income threshold did you choose and why? I chose an income threshold of $32,150, based on the 2023 U.S. Federal Poverty Guideline. This provides a policy-relevant definition of low-income households.
What elderly population threshold did you choose and why? I selected 945 residents aged 65 and over as the threshold, which represents roughly the top 25% of tracts by elderly population size. These areas are likely to face higher demand for healthcare and mobility support.
How many tracts meet your vulnerability criteria? 974 tracts meet at least one of the criteria.
What percentage of PA census tracts are considered vulnerable by your definition? Approximately 28.3% of all Pennsylvania census tracts are considered vulnerable. —
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
cat("Number of vulnerable tracts >15 miles from a hospital:", num_over15, "\n")
Number of vulnerable tracts >15 miles from a hospital: 314
Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection
Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? The average distance to the nearest hospital for vulnerable tracts is approximately 13.72 miles. - What is the maximum distance? The maximum distance from a vulnerable tract to the nearest hospital is 85.18 miles. - How many vulnerable tracts are more than 15 miles from the nearest hospital? There are 314 tracts located more than 15 miles from the nearest hospital.
Projection used: EPSG:3365 – NAD83 / Pennsylvania South. It was chosen because it preserves local distances and areas for Pennsylvania, ensuring accurate measurement results in meters rather than degrees. —
Step 5: Identify Underserved Areas
Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.
cat("Percentage of vulnerable tracts that are underserved:", perc_underserved, "%\n")
Percentage of vulnerable tracts that are underserved: 32.2 %
Questions to answer: - How many tracts are underserved? 314 tracts are considered underserved. - What percentage of vulnerable tracts are underserved? 32.2% of all vulnerable tracts are underserved. - Does this surprise you? Why or why not? This result is not surprising. Many underserved tracts are likely located in rural or remote counties, particularly in central and northern Pennsylvania, where hospitals are sparse and travel distances are long. In contrast, urban areas such as Philadelphia and Pittsburgh have higher hospital densities, providing better healthcare accessibility. —
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Your Task:
pa_counties_proj <-st_transform(pa_counties, st_crs(underserved))tracts_with_county <-st_join(underserved, pa_counties_proj, join = st_intersects, left =FALSE)st_crs(underserved)$input
[1] "EPSG:3365"
st_crs(pa_counties_proj)$input
[1] "EPSG:3365"
tracts_with_county <-st_join(underserved, pa_counties_proj, join = st_intersects, left =FALSE)names(pa_counties)
Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population
Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? NORTHUMBERLAND,SNYDER,TIOGA,VENANGO,PIKE
Which counties have the most vulnerable people living far from hospitals? ALLEGHENY,PHILADELPHIA,BUCKS,MONTGOMERY,LANCASTER
Are there any patterns in where underserved counties are located? Underserved counties are mainly located in rural northern and central Pennsylvania, where low population density and limited hospital infrastructure create significant barriers to healthcare access. —
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
Your Task:
# Create and format priority counties tablepriority_counties <- county_summary %>%arrange(desc(perc_underserved), desc(avg_distance_miles)) %>%slice_head(n =10) %>%mutate(perc_underserved =paste0(perc_underserved, "%"),avg_distance_miles =round(avg_distance_miles, 2),total_vulnerable_pop =formatC(total_vulnerable_pop, format ="d", big.mark =","))library(kableExtra)kable( priority_counties,col.names =c("County", "Vulnerable Tracts", "Underserved Tracts", "% Underserved", "Avg Distance (miles)", "Total Vulnerable Population"),caption ="Table 1. Top 10 Pennsylvania Counties with the Greatest Need for Healthcare Investment",align ="c") %>%kable_styling(bootstrap_options =c("striped", "hover", "condensed", "responsive"),full_width =FALSE,position ="center",font_size =13 ) %>%row_spec(0, bold =TRUE, background ="darkgrey", color ="white") %>%row_spec(1:10, background ="#f8f9fa") %>%add_header_above(c(" "=1, "Tract Counts"=2, "Accessibility & Population"=3))
Table 1. Top 10 Pennsylvania Counties with the Greatest Need for Healthcare Investment
Tract Counts
Accessibility & Population
County
Vulnerable Tracts
Underserved Tracts
% Underserved
Avg Distance (miles)
Total Vulnerable Population
PIKE
2
2
100%
77.36
8,286
WYOMING
1
1
100%
63.59
4,185
MCKEAN
1
1
100%
46.55
5,054
SNYDER
5
5
100%
44.30
27,456
NORTHUMBERLAND
5
5
100%
35.49
27,608
GREENE
1
1
100%
30.53
5,359
JUNIATA
1
1
100%
28.96
5,560
VENANGO
3
3
100%
28.69
12,686
TIOGA
4
4
100%
28.15
17,895
WARREN
1
1
100%
27.56
5,066
Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)
Part 2: Comprehensive Visualization
Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.
Map 1: County-Level Choropleth
Create a choropleth map showing healthcare access challenges at the county level.
Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels
Map 2: Detailed Vulnerability Map
Create a map highlighting underserved vulnerable tracts.
Your Task:
library(ggplot2)library(dplyr)library(sf)county_map <- pa_counties %>%st_transform(3365) %>%left_join(county_summary, by =c("COUNTY_NAM"="COUNTY_NAM"))hospital_points <- pa_hospital %>%st_transform(3365)ggplot() +geom_sf(data = county_map, aes(fill = perc_underserved), color ="white", size =0.3) +geom_sf(data = hospital_points, color ="pink", size =1, alpha =0.7) +scale_fill_gradient(name ="% Underserved\nVulnerable Tracts",low ="lightgrey", high ="black",labels =function(x) paste0(x, "%") ) +labs(title ="Healthcare Accessibility Across Pennsylvania Counties",subtitle ="Percentage of vulnerable tracts underserved and hospital locations",caption ="Data source: 2023 ACS 5-Year Estimates & Pennsylvania Department of Health" ) +theme_void() +theme(plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12, margin =margin(b =10)),plot.caption =element_text(size =9, color ="gray40"),legend.title =element_text(size =10, face ="bold"),legend.text =element_text(size =9),legend.position ="right" )
Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create a clean dataframe of distancesdist_df <- dist %>%st_drop_geometry() %>%select(dist_miles)ggplot(dist_df, aes(x = dist_miles)) +geom_histogram(bins =30,fill ="lightpink",color ="white",alpha =0.8 ) +labs(title ="Distribution of Distances to the Nearest Hospital (Vulnerable Tracts)",x ="Distance to Nearest Hospital (miles)",y ="Number of Vulnerable Census Tracts",caption ="Most vulnerable tracts are within 15 miles of a hospital, but a right tail shows rural access challenges." ) +theme_minimal(base_size =13) +theme(plot.title =element_text(size =15, face ="bold"),axis.title =element_text(face ="bold"),plot.caption =element_text(size =10, color ="gray40") )
Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size
Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Challenge Options
Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).
Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question
School Safety Zones - Data: Schools, Crime Incidents, Bike Network - Question: “Are school zones safe for walking/biking, or are they crime hotspots?” - Operations: Buffer schools (1000ft safety zone), spatial join with crime incidents, assess bike infrastructure coverage - Policy relevance: Safe Routes to School programs, crossing guard placement
Data Sources
OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.
Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries
Recommended Starting Points
If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics
If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations
Your Analysis
Your Task:
Find and load additional data
Document your data source
Check and standardize the CRS
Provide basic summary statistics
# Load your additional datasetlibrary(sf)library(tidyverse)library(tmap)schools <-st_read("/Users/cathy/GitHub/portfolio-setup-uxiaoo22/Assignments/Assignment_2/Schools.geojson")
Reading layer `Schools' from data source
`/Users/cathy/GitHub/portfolio-setup-uxiaoo22/Assignments/Assignment_2/Schools.geojson'
using driver `GeoJSON'
Simple feature collection with 495 features and 14 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -75.2665 ymin: 39.90781 xmax: -74.97057 ymax: 40.12974
Geodetic CRS: WGS 84
Reading layer `PhiladelphiaBikeNetwork201204' from data source
`/Users/cathy/GitHub/portfolio-setup-uxiaoo22/Assignments/Assignment_2/PhiladelphiaBikeNetwork_SupportingDatasets201209/BikeNetwork_SupportingDatasets201209/PhiladelphiaBikeNetwork201204.shp'
using driver `ESRI Shapefile'
Simple feature collection with 3719 features and 7 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 2663613 ymin: 207711.1 xmax: 2747362 ymax: 299020.2
Projected CRS: NAD83 / Pennsylvania South (ftUS)
# Convert all to same projected CRSschools <-st_transform(schools, 2272)crime <-st_transform(crime, 2272)bike_network <-st_transform(bike_network, 2272)st_crs(schools)
Coordinate Reference System:
User input: EPSG:2272
wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
BASEGEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]],
CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
METHOD["Lambert Conic Conformal (2SP)",
ID["EPSG",9802]],
PARAMETER["Latitude of false origin",39.3333333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8821]],
PARAMETER["Longitude of false origin",-77.75,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8822]],
PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8823]],
PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8824]],
PARAMETER["Easting at false origin",1968500,
LENGTHUNIT["US survey foot",0.304800609601219],
ID["EPSG",8826]],
PARAMETER["Northing at false origin",0,
LENGTHUNIT["US survey foot",0.304800609601219],
ID["EPSG",8827]]],
CS[Cartesian,2],
AXIS["easting (X)",east,
ORDER[1],
LENGTHUNIT["US survey foot",0.304800609601219]],
AXIS["northing (Y)",north,
ORDER[2],
LENGTHUNIT["US survey foot",0.304800609601219]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
BBOX[39.71,-80.53,41.18,-74.72]],
ID["EPSG",2272]]
st_crs(crime)
Coordinate Reference System:
User input: EPSG:2272
wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
BASEGEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]],
CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
METHOD["Lambert Conic Conformal (2SP)",
ID["EPSG",9802]],
PARAMETER["Latitude of false origin",39.3333333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8821]],
PARAMETER["Longitude of false origin",-77.75,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8822]],
PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8823]],
PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8824]],
PARAMETER["Easting at false origin",1968500,
LENGTHUNIT["US survey foot",0.304800609601219],
ID["EPSG",8826]],
PARAMETER["Northing at false origin",0,
LENGTHUNIT["US survey foot",0.304800609601219],
ID["EPSG",8827]]],
CS[Cartesian,2],
AXIS["easting (X)",east,
ORDER[1],
LENGTHUNIT["US survey foot",0.304800609601219]],
AXIS["northing (Y)",north,
ORDER[2],
LENGTHUNIT["US survey foot",0.304800609601219]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
BBOX[39.71,-80.53,41.18,-74.72]],
ID["EPSG",2272]]
st_crs(bike_network)
Coordinate Reference System:
User input: EPSG:2272
wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
BASEGEOGCRS["NAD83",
DATUM["North American Datum 1983",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4269]],
CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
METHOD["Lambert Conic Conformal (2SP)",
ID["EPSG",9802]],
PARAMETER["Latitude of false origin",39.3333333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8821]],
PARAMETER["Longitude of false origin",-77.75,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8822]],
PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8823]],
PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8824]],
PARAMETER["Easting at false origin",1968500,
LENGTHUNIT["US survey foot",0.304800609601219],
ID["EPSG",8826]],
PARAMETER["Northing at false origin",0,
LENGTHUNIT["US survey foot",0.304800609601219],
ID["EPSG",8827]]],
CS[Cartesian,2],
AXIS["easting (X)",east,
ORDER[1],
LENGTHUNIT["US survey foot",0.304800609601219]],
AXIS["northing (Y)",north,
ORDER[2],
LENGTHUNIT["US survey foot",0.304800609601219]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
BBOX[39.71,-80.53,41.18,-74.72]],
ID["EPSG",2272]]
nrow(schools); nrow(crime); nrow(bike_network)
[1] 495
[1] 120567
[1] 3719
Questions to answer: - What dataset did you choose and why? I selected three datasets, Philadelphia Schools, Crime Incidents, and the Philadelphia Bike Network for analyzing school safety environments. This combination allows an examination of whether schools are located in areas with high crime density and limited bicycle infrastructure, which is relevant for understanding walkability and student safety.
What is the data source and date? All datasets were obtained from OpenDataPhilly. Schools: Data includes points identifying public, charter, private, and archdiocesan schools, as well as school annexes, athletic fields, and facilities. It supports the Streets Department’s school signage and crosswalk initiatives. Last updated February 2, 2024.
Crime Incidents: Provided by the Philadelphia Police Department (2025), including Part I crimes such as aggravated assault, rape, and arson.
Bike Network – Supporting Datasets: Compiled by the Philadelphia City Planning Commission (April 2012), integrating data from multiple city departments to support bike network routing and documentation.
How many features does it contain? Schools: 495 point features Crime incidents: 120,567 point features Bike network: 3,719 line features
What CRS is it in? Did you need to transform it? All datasets were standardized to EPSG:2272 – NAD83 / Pennsylvania South (ftUS). The Schools dataset was originally in WGS 84 (EPSG:4326) and was transformed to EPSG:2272 to ensure consistent spatial analysis and accurate distance or buffer measurements in feet.
Pose a research question
Research Question:Are schools in high-crime areas lacking nearby bicycle infrastructure?
Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
Table 1. Summary Statistics of School Safety and Bicycle Infrastructure in Philadelphia
Total Schools
At-Risk Schools
Average Crimes Near Schools
495
52
1083.9
# Drop geometry and clean dataat_risk_table <- at_risk_schools %>%st_drop_geometry() %>%filter(!is.na(school_name_label)) %>%select(`School Name`= school_name_label,`Crime Incidents`= crime_count,`Nearby Bike Lane`= has_bike ) %>%mutate(`Crime Incidents`=formatC(`Crime Incidents`, format ="d", big.mark =","),`Nearby Bike Lane`=if_else(`Nearby Bike Lane`==1, "Yes", "No") ) %>%arrange(desc(as.numeric(`Crime Incidents`))) %>%slice_head(n =10)# Output tableat_risk_table %>%kbl(caption ="Table 2. Top 10 At-Risk Schools in Philadelphia: High Crime and No Nearby Bicycle Infrastructure",align =c("l", "c", "c"),booktabs =TRUE ) %>%kable_styling(full_width =FALSE,bootstrap_options =c("striped", "hover", "condensed", "responsive") ) %>%column_spec(1, bold =TRUE, width ="5cm") %>%column_spec(2:3, width ="3cm") %>%add_header_above(c(" "=1, "Safety Risk Indicators"=2))
Table 2. Top 10 At-Risk Schools in Philadelphia: High Crime and No Nearby Bicycle Infrastructure
Safety Risk Indicators
School Name
Crime Incidents
Nearby Bike Lane
TECH FREIRE CHARTER SCHOOL
138
No
ALLIANCE FOR PROGRESS CHARTER SCHOOL
128
No
ALLIANCE FOR PROGRESS CHARTER SCHOOL (ANNEX)
121
No
ISAAC A. SHEPPARD SCHOOL
113
No
ONE BRIGHT RAY - FAIRHILL CAMPUS
112
No
YOUTHBUILD PHILADELPHIA CHARTER SCHOOL
109
No
GENERAL GEORGE G. MEADE SCHOOL
100
No
KIPP NORTH PHILADELPHIA ACADEMY
95
No
DEEP ROOTS CHARTER SCHOOL
95
No
PEOPLE FOR PEOPLE CHARTER SCHOOL
91
No
Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)
Your interpretation: The analysis found that 52 out of 495 schools in Philadelphia are located in areas with both high crime density and no nearby bicycle infrastructure, posing potential safety concerns for students walking or biking to school.Most at-risk schools are concentrated in North and West Philadelphia, where crime incidents are more frequent. The absence of bicycle lanes near these schools may reduce safe commuting options for students and discourage active travel.Integrating safer bike routes and crime prevention strategies around schools could significantly improve student safety and accessibility.
Finally - A few comments about your incorporation of feedback!
I focused on simpler, cleaner map design and improved table formatting for easier interpretation.
Submission Requirements
What to submit:
Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
Use embed-resources: true in YAML so it’s a single file
All code should run without errors
All maps and charts should display correctly
File naming:LastName_FirstName_Assignment2.html and LastName_FirstName_Assignment2.qmd